Monitoring network traffic to determine similar content

ABSTRACT

In an embodiment, a method monitors a plurality of data streams passing through a router in the connectivity service provider environment, and for each of the data streams, periodically samples packets at the router. The method further generates a stream signature based at least on the payload of the sampled packets. The method further includes, for each generated stream signature, attaching information to the stream signature. Such information may, for example, include time-stamp information for the stream signature, or an identification of the router. The method may further comprise storing the stream signatures corresponding to the data streams in a database. The stored stream signatures may be compared to determine matching stream signatures. Matching signatures may identify data streams that carry identical or similar content.

BACKGROUND Field

This field is generally related to traffic monitoring.

Related Art

Network services may, for example, provide connectivity from a customernetwork to another computer network, such as the Internet. A clientconnects to a server using a connectivity service provided by aconnectivity service provider. Multiple routers in a connectivityservice provider environment may concurrently carry traffic withidentical or similar content to one or more clients leading toinefficient usage of network resources.

A data stream transmitted from a content service provider (e.g., mediacontent service provider, voice-over-IP (VOIP) service provider, etc.)to a client (e.g., an end user) is typically identified by a series ofdata packets sharing a source and destination address, and protocolinformation. For example, packets in a data stream may have a commonsource IP address, source port address, destination IP address,destination port address, and protocol information. In a specificexample, a data stream may correspond to a media streaming session for aclient streaming a movie from a media content provider.

In a connectivity service provider environment, two or more clients maybe simultaneously receiving the same content from a common contentprovider. For example, two end users may be streaming the same movieconcurrently. Alternatively, one end user may be receiving a very heavy(requiring high bandwidth) data stream that is sent to the end userthrough two or more paths at the same time. To this end, two or morerouters in the connectivity service provider environment may be relayingsimilar or identical content to two or more clients.

A connectivity service provider may be interested in knowing that two ormore of its clients are concurrently receiving the same or identicaldata streams. This information may be used by the service provider for avariety of reasons. For example, the connectivity service provider mayleverage this information to provide a better route for this data streamthrough its routers, improving the efficiency of using networkresources. In other examples, the connectivity service provider may usethis information to perform multicasting, or data compression.

Different streams may use different protocols. It is possible for theconnectivity service provider to inspect the traffic going through itsrouters, and identify how much traffic per unit time is being relayedfor each protocol. Different protocols may include media streaming,voice over IP (VoIP), etc. Additionally, to identify similar streams, aconnectivity service provider may look into packets being relayedthrough its routers.

However, it is often not enough to just look at the information in theheader of packets of two data streams to identify whether these twostreams are carrying the same content. For example, if a data stream isbeing simultaneously sent to two different end users, the destination IPaddress (and possibly the destination port address) of the packetsaddressed to these two end users are different. Therefore, a mechanismis required to look into the payload of the packets of two data streamsto determine that they carry the same content.

However, parsing the payload of packets being relayed through therouters of a connectivity service provider is typically verycomputation-intensive. Additionally, parsing packets intended for aclient may violate the client's privacy. Therefore, a different methodis desired for identifying similar streams by looking at the payload ofthe packets being related through the routers of a connectivity serviceprovider.

BRIEF SUMMARY

In an embodiment, a method is disclosed for identifying similar datastreams in a connectivity service provider environment. The method maycomprise selecting a predetermined number of data streams with highestdata rates among a plurality of data streams passing through a router inthe connectivity service provider environment. The method may furthercomprise, for each of the selected data streams, receiving periodicalsample packets from the router, and generating a stream signature basedon the sample packets. The method may further comprise, for eachgenerated stream signature, attaching time-stamp information to thestream signature, attaching an identification of the router to thestream signature, and storing the stream signatures corresponding to theselected streams in a database. The above two stream signatures storedin the database may be compared to determine whether the correspondingstreams are similar.

In an embodiment, a method is disclosed for monitoring data streamsthrough routers in a connectivity service provider environment thatcomprises retrieving, from a database, a plurality of stream signaturesand a plurality of data streams relayed through a plurality of routersin the connectivity service provider environment. The method may furthercomprise classifying the plurality of stream signatures into a number ofclasses. Each class identifies a group of stream signatures that mostresemble each other. The method may further comprise identifying a datastream simultaneously relayed through at least two routers in theconnectivity service provider environment. The identification may bebased on examining stream signatures in a class, and determining atleast two stream signatures within the class that have substantiallysimilar time-stamp information.

System and computer program product embodiments are also disclosed.

Further embodiments, features, and advantages of the invention, as wellas the structure and operation of the various embodiments, are describedin detail below with reference to accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, which are incorporated herein and form partof the specification, illustrate the present disclosure and, togetherwith the description, further serve to explain the principles of thedisclosure and to enable a person skilled in the relevant art to makeand use the disclosure.

FIG. 1 is a diagram of a connectivity service provider includingcomponents for sampling data streams, providing stream signatures, andidentifying streams with similar content, according to an embodiment.

FIG. 2 is a flowchart for a method of monitoring data steams at arouter, and providing stream signatures, according to an embodiment.

FIG. 3 is a flowchart for a method for identifying streams with similarcontent, according to an embodiment.

FIG. 4 is a diagram showing an example stream being relayed through atleast two paths in a set of routers in a connectivity service provider,according to an embodiment.

FIG. 5 is a diagram depicting a stream identification server shown inFIG. 1 in further detail, according to an embodiment.

The drawing in which an element first appears is typically indicated bythe leftmost digit or digits in the corresponding reference number. Inthe drawings, like reference numbers may indicate identical orfunctionally similar elements.

DETAILED DESCRIPTION

A method and system according to an embodiment discloses how todetermine that two different streams carry similar (or identical)content. In an embodiment, data streams are monitored at a router, and astream signature is generated for each data stream. Each streamsignature may be generated based on periodically sampled packets withina stream and may only correspond to a predetermined time interval. Oncea stream signature is generated, time stamp information may be attachedto the stream signature to identify to which time interval the streamsignature corresponds. Additionally, router identification informationmay be attached to the stream signature to identify at which router thedata stream was observed. Other types of information may additionally oralternatively be attached to a stream signature.

In an embodiment, stream signatures collected from routers within aconnectivity service provider environment are clustered into multipleclusters, where each cluster includes signatures most similar to eachother. Each individual cluster may then be examined, and matchingsignatures within the cluster may be identified. Two matching signaturesindicate identical or similar content of their corresponding datastreams. Routers carrying identical or similar content may next beidentified.

In an embodiment, the connectivity service provider may use informationon data streams with identical or similar content to more efficientlyutilize its communication resources.

FIG. 1 is a diagram of a data transmission environment 100 that includesa connectivity service provider 102. Connectivity service provider 102includes a plurality of routers 104(a) to 104(M) that provide networkconnectivity to the Internet 116 for the plurality of clients 106(a) to106(K).

Clients 106(a) to 106(K) may be end users in a residential environment,and accessing the Internet through their devices such as personalcomputer (PC), laptop, tablet, smart phone, etc. Alternatively, clientsmay be commercial such as a business. For example, a business mayrequire connectivity to a data storage server to back-up all the fileswithin its system.

Clients 106(a) to 106(K) may use connectivity service provider 102 toreceive data from a variety of content providers, such as streamingservers 114(a) and 114(c), data storage server 114(b), news server114(d), and/or other servers such as a cloud server. Clients 106(a) to106(K) may use connectivity service provider 102 to also send (upload)data to a variety of servers.

Clients 106(a) to 106(K) may, for example, reside in a local areanetwork (LAN) confined to a particular area, such as a building. Inanother example, clients 106(a) to 106(K) may be in a company intranetconnecting different computers in the same organization. The variouscomputers may have web browsers or other applications that requireaccess to resources via a network, such as a private network or theInternet. Connectivity service provider environment 102 may use TCP/IProuting protocols, and clients 106(a) to 106(K) may use a publicly orprivately addressable IP addresses.

To determine how to route data, the various routers 104(a) to 104(M) onconnectivity service provider environment 102 can exchange messagesadvertising their connectivity. The messages may for example be BGPmessages. In that example, the routers that exchange messages may be BGPpeers. Using these messages, the various routers can develop routingtables that define how to route data through the network.

Not only do various routers 104(a) to 104(M) on connectivity serviceprovider environment 102 exchange messages, these routers also exchangesmessages with at least one router on Internet 116. The routers on theInternet 116 may comprise backbone routers, etc. that providecommunication between clients 106(a) to 106(K) of connectivity serviceprovider environment 102 and various servers 114(a) to 114(d) usingcable, fiber optics, and/or wireless communication.

Connectivity service provider 102 includes components for sampling datastreams, providing stream signatures, and identifying streams withsimilar content, as will be described next, according to embodiments.

Each router 104(a) to 104(M) has access to a sample collector 108 thatperiodically samples data packets being relayed through the router. Forexample, sample collector 108 may receive a sample packet from router104(a) every 10 seconds. Other sampling periods may alternatively beused. All the samples received from each of routers 104(a) to 104(M)that correspond to a predetermined time period, e.g., 5 minutes, arestored in a buffer. For example, router 104(a) may have stored on itsinternal memory a first buffer that includes all sample packets (samplede.g., every 5 or 10 seconds) received from router 104(a) between 5:00 PMand 5:05 PM on a particular day. Additionally, router 104(a) may havestored on its internal memory a second buffer that includes all samplepackets (sampled e.g., every 5 or 10 seconds) received from router104(a) between 5:10 PM and 5:15 PM on the same day. Router 104(b) mayalso have stored on its internal memory a third buffer that includes allsample packets received from router 104(b) between 5:00 PM and 5:05 PMon the same day, and so on.

In an embodiment, for each router 104(a) to 104(M), sample collector 108only receives sample packets corresponding to the N (e.g., 1, 2, 3 orhigher) heaviest data streams passing through the router. The heaviestdata streams may be those requiring the greatest amount of bandwidth. Inthe above example, the first buffer may only include sample packetsreceived from router 104(a) between 5:00 PM and 5:05 PM on a particularday that correspond to the 3 data streams having the highest rate ofdata transmission.

In an embodiment, a determination as to which data streams should besampled by sample collector 108 is made by each of the routers 104(a) to104(M). In an example according to this embodiment, each router 104(a)to 104(M) has a user interface (graphical or command line). In thisexample, a user (such as an administrator) in the connectivity serviceprovider environment 102 configures the number of data streams that areto be sampled by sample collector 108. Additionally or alternatively,the user may determine the characteristics of the data streams that areto be sampled (e.g., those streams belonging to a certain protocol, acertain service level agreement, etc.). Additionally or alternatively,the user may determine the sampling period (e.g., every 5 seconds), andthe predetermined time period for which the collected sample packetswill be stored under one buffer (e.g., for a duration of 5 minutes).

In an embodiment, sample collector 108 may be a daemon operating on eachrouter 104(a) to 104(b), and periodically collecting packets.Alternatively, sample collector 108 may be a server separate fromrouters 104(a) to 104(M).

Signature generator 112 receives sample packets collected by samplecollector 108, optionally filters the sample packets, generates streamsignatures based on the sample packets those not filtered out andcollected during a predetermined time interval (e.g., between 5:00 PM to5:05 PM), and stores the generated signatures in signature database 110.We now describe these operations in detail.

Signature generator 112 generates a stream signature based on a seriesof sample packets received in chronological order that are allassociated with a data stream passing through a router, such as router104(a), and correspond to a predetermined time interval. The streamsignature is generated based on the header information in the samplepackets, as well as the payload of the sample packets. The data withinthe payload may or may not be encrypted. In generating a streamsignature, the payload of a sample packet is considered simply as abinary string, i.e., no parsing is performed on the payload.

In an embodiment, the stream signature may be generated using a hashfunction applied to the payload and header content of the samplepackets.

In an embodiment, the stream signature may be a binary string, and bemultiple bytes in length.

In an embodiment, the stream signature may be generated by applyingminhash to the payload and/or header content of the sample packets.Minhash is a technique for quickly estimating how similar two sets are.Minhash estimates a Jaccard similarity coefficient of two sets (theratio of the cardinality of the intersection of two sets to the union ofthe two sets). Minhash may be used for eliminating near-duplicates amongdifferent types of data such as web-pages, images, etc. In thisembodiment, minhash may be applied to (header and/or payload informationof) the sample packets received from a data stream within apredetermined amount of time, and discard those sample packets thatcontain near-identical content (header and/or payload). A signature maythen be generated for the data stream for the predetermined period oftime after eliminating such redundant sample packets.

In an embodiment, once a stream signature is generated, time stampinformation is attached to the stream signature to identify to whichtime interval the stream signature corresponds.

In an embodiment, once a stream signature is generated, routeridentification information is attached to the stream signature toidentify at which router the data stream was ob served.

In an embodiment, various statistical information about the stream maybe used to generate the stream signature. These statistical informationmay, for example, include mean data rate, variance of data rate,skewness of data rate, minimum and/or maximum instantaneous data rate,kurtosis of the data rate, and scale of the data rate.

Alternatively, the above statistical information may be attached to thestream signature to specify the characteristics of the data stream towhich the stream signature corresponds.

Signature generator 112 may optionally filter sample packets. In anembodiment, signature generator 112 may discard packets that containinformation similar to those previously received from other samplepackets. In an example according to this embodiment, the filtering isperformed using minhash as described above. In an example according tothis embodiment, a stream signature is generated by incrementallyconsidering more (chronologically ordered) sample packets. Therefore, ifthe stream signature based on the first n samples within thepredetermined time interval does not change (or substantially change)after considering sample n+1, sample n+1 is discarded.

Signature generator 112 corresponding to each router 104(a) to 104(M)stores the generated signatures of the router in signature database 110.

Stream identification server 118 retrieves the stored stream signatures,and performs comparison between all signatures to identify matchingstream signatures. Matching stream signatures correspond to thosestreams whose signatures are substantially similar, and also correspondto the same time interval. This will be described further with respectto FIG. 3 . Stream identification server 118 may or may not residewithin connectivity service provider environment 102.

FIG. 2 is a flowchart for a method of monitoring data steams at arouter, and providing stream signatures, according to an embodiment.

At step 210, all streams passing through a router, such as router104(a), are monitored, and their characteristics such as data rate,protocol, etc. are identified. Based on this information, at step 220, apredetermined number (“N”) of most bandwidth-intensive (heaviest) datastreams are selected for further monitoring. At step 230, periodicalsamples of each of the N heaviest data streams are collected. Samplecollection is performed by a sample collector such as sample collector108.

The period of packet sampling may vary based on the characteristicsand/or the configuration of the components of connectivity serviceprovider environment 102, and/or the characteristics of the datastreams. For example, the sampling period may be determined to be 5, 10or 15 seconds.

At step 235, sample packets are filtered. Filtering may be performed ata signature generator, such as signature generator 112. As a result ofthis filtering, some sample packets may be discarded.

At step 240, a stream signature is generated for each of the N selecteddata streams. As described above with respect to signature generator 112in FIG. 1 , for each data stream, the stream signature may be generatedbased on payload and/or header content of the sample packets collectedfor that data stream.

In embodiments, additional information may be used in generating thestream signature. Such additional information may, for example, bestatistical information about the data rate of the data stream, thesource and destination address information of the received samplepackets, or information on a total amount of data relayed within thepre-determined period of time may also be used to generate the streamsignature.

As previously described with respect to FIG. 1 , packets that containinformation similar to those previously received from other samplepackets may be discarded. In an example according to this embodiment, astream signature is generated by incrementally considering more(chronologically ordered) sample packets. Therefore, if the streamsignature based on the first n samples within the predetermined timeinterval does not change (or substantially change) after consideringsample n+1, sample n+1 is discarded.

Since as described above sample packet filtering may be performediteratively with stream signature generation considering the most recentgenerated signature, in FIG. 2 , a loop is shown between steps 230, 235,and 240.

In an embodiment, the stream signature may be generated using a hashfunction applied to the payload and/or header content of the samplepackets. The stream signature may be a binary string, and be multiplebytes in length.

In an embodiment, once a stream signature is generated, additionalinformation may be attached to the stream signature. Such additionalinformation may, for example, be time stamp information to identify towhich time interval the stream signature corresponds, routeridentification information to identify at which router the stream wasobserved, or information about the received packets. The informationabout the received packets may, for example, comprise packet sizeinformation such as Maximum Transmission Unit (MTU) of the packets,packet protocol information (e.g., TCP, UDP, HTTP, STP, RTCP, etc.), ora flag in packet header (e.g., flags indicating packet priority, Classof Service (Cos)), and/or stream identification information such asvirtual local access network identifier (VLAN ID). Alternatively, theabove information may be used to generate a stream signature.

At step 250, the generated stream signatures for the selected streamsare provided to a stream identification server, such as streamidentification server 118. As will be described shortly with respect tomethod 300 in FIG. 3 , the stream identification server comparesdifferent stream signatures and identifies those streams that aresubstantially similar.

FIG. 3 is flowchart for a method for identifying streams with similarcontent, according to an embodiment. In an embodiment, this method isperformed by a stream identification server such as streamidentification server 118.

At step 310, a plurality of stream signatures is received from aplurality of routers. Specifically, the stream signatures may bereceived from signature generator(s) such as signature generator 112within each router 104(a) to 104(M).

At step 320, the received stream signatures are classified where membersof each class are substantially similar stream signatures. Since thenumber of these classes in not known beforehand, an unsupervisedclassification method may be used. For example, clustering (e.g.,k-means, hierarchical, mixture models), or unsupervised decision treesmay be utilized to classify the stream signatures.

At step 330, each class is further investigated to identify streamsignatures that also have similar time stamp information, i.e.,correspond to substantially similar time intervals. Note that twomatching stream signatures corresponding to two streams with identicalcontent may not have identical time stamps, due to the sampling processnot being synchronized across all routers in the connectivity serviceprovider environment. For example, router 104(a) and router 104(b) mayhave internal clocks with 1 second time difference. Therefore, a streampassing through router 104(a) during a time interval between 5:00:00 PMand 5:10:00 PM of a certain day may correspond to an identical (incontent) stream passing through router 104(b) during a time intervalbetween 5:00:01 and 5:10:01. Therefore, in an embodiment, in matchingstream signatures in one class, two streams with substantially the sametime stamps (instead of identical time stamps) are considered matching.

Further checks may be performed on matching signatures withsubstantially similar time stamps. For example, in an embodiment,statistical information on the data rate of the corresponding streams isattached to the stream signatures. This would allow the streamidentification server to determine if two streams are sufficientlysimilar. For example, if signature A and signature B are clustered inone cluster (class), and they also have substantially the same timestamp, but the average data rate of the data stream corresponding tosignature A is 10 times less than the average data rate of the datastream corresponding to signature B, signatures A and B are notconsidered to be a match in the stream identification server.

In another embodiment, statistical information on the data rate of thecorresponding streams is used to generate stream signatures. Therefore,two signatures in the same cluster would necessarily have substantiallysimilar data rate statistics as well.

At step 340, once matching signatures (with similar time stamp, andother desired information) are identified, the routers relaying thestreams based on which the matching signatures were generated are alsoidentified. The router information may be attached to each streamsignature.

In an embodiment, based on this information, the connectivity serviceprovider may determine a map of similar data streams traveling throughits routers, and specific paths of these data streams through therouters. An example of two identical data streams simultaneously beingrelayed through a connectivity service provider environment is shown inFIG. 4 . We will describe FIG. 4 shortly.

In an embodiment, once the connectivity service provider environmentidentifies that the same stream is being relayed through its routers intwo separate paths, it reroutes the stream to more efficiently usenetwork resources (e.g., to avoid duplicate transmissions requiringunnecessarily allocated network resources).

In an embodiment, once the connectivity service provider environmentidentifies that the same stream is being relayed through its routers inmultiple separate paths, it identifies or suspects a network securityattack. The connectivity service provider environment may consequentlytake remedial actions to address the attack. For example, theconnectivity service provider environment may configure firewalls on itsrouters to block this data stream, and/or the server from which thesuspicious data streams originate. Additionally or alternatively, theconnectivity service provider environment may send alerts to clients(e.g., clients 106(a) to 106(c)) informing them of a security attack.Additionally or alternatively, the connectivity service providerenvironment may log the suspicious data streams for futureinvestigation. Additionally or alternatively, the connectivity serviceprovider environment may flag all the data streams originating from thesame server as that of the suspicious data streams.

FIG. 4 is a diagram showing an example stream being relayed through atleast two paths in a set of routers in a connectivity service provider,according to an embodiment. Specifically, stream 404 is simultaneouslyrelayed from streaming server 114(a) to client 106(a) and 106(b). As canbe seen in FIG. 4 , in a first path to client 106(a), the data streampasses through routers 104(e) and 104(a). In a second path to client106(b), the same data stream passes through routers 104(c), 104(b) and104(d).

In an example embodiment according to the example scenario shown in FIG.4 , connectivity service provider environment 102 may decide toconsolidate the two paths by relaying the data stream 404 to pass onlythrough routers 104(c), 104(b) and 104(d), and be delivered to clients106(a) and 106(b) from router 104(d) using unicast transmissions.

In another example according to the example scenario shown in FIG. 4 ,connectivity service provider environment 102 consolidates the two pathsby relaying the data stream 404 to pass only through routers 104(c),104(b) and 104(d), and be delivered to clients 106(a) and 106(b) fromrouter 104(d) by multicasting.

FIG. 5 is a diagram of a stream identification server shown in FIG. 1 infurther details, according to an embodiment.

Stream identification server 118 comprises a stream signatures retrievalmodule 506, a classification module 508, an similar streamidentification module 504, one or more processors 510, and mayoptionally comprise an interface module 502. Each of these componentsare described below in turn.

Stream signatures retrieval module 506 retrieves a plurality of streamsignatures from a signature database such as signature database 110.Communication between signature database 110 and stream signaturesretrieval module 506 may be through wired or wireless communicationtechniques such as cable, a shared bus, WiFi, or Bluetooth.

Classification module 508 classifies the retrieved stream signaturesinto a plurality of classes. This module implements operationspreviously described with respect to step 320 in FIG. 3 . In anembodiment, classification module 508 performs clustering (e.g., k-meansclustering) on the retrieved stream signatures and identifies aplurality of clusters of signatures.

Similar stream identification module 504 examines each identifiedcluster, and within each cluster determines matching stream signaturesthat have substantially similar time stamp information. Similar streamidentification 504 may use statistical information about the streams(e.g., average, variance, skewness, minimum and/or maximum of data ratewithin a predetermined interval identified in the time stamp informationfor each data stream) to further refine the determination of matchingstream signatures. Additionally, similar stream identification module504 identifies routers relaying identified similar streams. Similarstream identification module 504 implements operations describedpreviously with respect to steps 330 and 340 of FIG. 3 .

Stream identification server 118 may optionally comprise a userinterface module 502 that receives configuration information from a userof the connectivity service provider environment (e.g., an administratorof the connectivity service provider or a client). The configurationinformation may include one or more criteria and/or information inperforming similar stream identification. For example, the configurationinformation may determine that information on average and variance of astream's data rate, and protocol information included in streamsignatures may be used in identifying matching signatures. Additionallyor alternatively, the configuration information may specify that k-meansclustering should be used for classification, and so on.

One or more processors 510 may be used to implement, coordinate, and/orconfigure the component modules within stream identification server 118.The modules may be implemented using hardware, software or a combinationthereof.

The term “user,” as used herein, may encompass both a customer of thenetwork connectivity service, such as an employee of a business thatutilizes the network connectivity service, and a network administratorof the service provider itself. Users may also be at different companiesor organizations.

The tables disclosed herein may be any stored type of structured memory,including a persistent memory. In examples, each database may beimplemented as a relational database or file system.

Each of the devices and modules in FIGS. 1 and 5 may be implemented inhardware, software, firmware, or any combination thereof.

Each of the devices and modules in FIGS. 1 and 5 may be implemented onthe same or different computing devices. Such computing devices caninclude, but are not limited to, a personal computer, a mobile devicesuch as a mobile phone, workstation, embedded system, game console,television, set-top box, or any other computing device. Further, acomputing device can include, but is not limited to, a device having aprocessor and memory, including a non-transitory memory, for executingand storing instructions. The memory may tangibly embody the data andprogram instructions. Software may include one or more applications andan operating system. Hardware can include, but is not limited to, aprocessor, a memory, and a graphical user interface display. Thecomputing device may also have multiple processors and multiple sharedor separate memory components. For example, the computing device may bea part of or the entirety of a clustered or distributed computingenvironment or server farm.

Identifiers, such as “(a),” “(b),” “(i),” “(ii),” etc., are sometimesused for different elements or steps. These identifiers are used forclarity and do not necessarily designate an order for the elements orsteps.

The present invention has been described above with the aid offunctional building blocks illustrating the implementation of specifiedfunctions and relationships thereof. The boundaries of these functionalbuilding blocks have been arbitrarily defined herein for the convenienceof the description. Alternate boundaries can be defined so long as thespecified functions and relationships thereof are appropriatelyperformed.

The foregoing description of the specific embodiments will so fullyreveal the general nature of the invention that others can, by applyingknowledge within the skill of the art, readily modify and/or adapt forvarious applications such specific embodiments, without undueexperimentation, without departing from the general concept of thepresent invention. Therefore, such adaptations and modifications areintended to be within the meaning and range of equivalents of thedisclosed embodiments, based on the teaching and guidance presentedherein. It is to be understood that the phraseology or terminologyherein is for the purpose of description and not of limitation, suchthat the terminology or phraseology of the present specification is tobe interpreted by the skilled artisan in light of the teachings andguidance.

The breadth and scope of the present invention should not be limited byany of the above-described exemplary embodiments, but should be definedonly in accordance with the following claims and their equivalents.

What is claimed is:
 1. A system comprising: at least one processor; andmemory storing instructions that, when executed by the at least oneprocessor, causes the system to perform a set of operations, the set ofoperations comprising: sampling packets of a first data stream receivedat a first router of a plurality of routers in the connectivity serviceprovider environment; generating a stream signature of the first datastream based at least on a subset of the sampled packets, the streamsignature comprising an identification of the first router at which thesampled packets were received; determining, based at least in part onthe identification of the router in the stream signature, that the firstdata stream is similar to a second data stream relayed through a secondrouter different than the first router; and rerouting, based on thedetermining, at least the first data stream within the connectivityservice provider environment.
 2. The system of claim 1, wherein thesubset of sampled packets is generated by filtering the sampled packetsto exclude at least one packet based on a determination that the atleast one packet does not change a stream signature for the first datastream.
 3. The system of claim 2, wherein the stream signature isgenerated without parsing a payload of the filtered packets.
 4. Thesystem of claim 2, wherein the stream signature is generated furtherbased at least on a source address and a destination address of at leastone packet of the filtered packets.
 5. The system of claim 1, whereinthe stream signature is generated without parsing a payload of thesubset of sampled packets.
 6. The system of claim 1, wherein the streamsignature is generated further based on statistical information aboutthe first data stream comprising at least one of: mean of data rate;variance of data rate; skewness of data rate; minimum data rate; ormaximum data rate.
 7. The system of claim 1, wherein the first datastream is sampled according to a predetermined time interval, andwherein the first data stream is sampled for a predetermined timeperiod.
 8. A method for identifying similar data streams in aconnectivity service provider environment, the method comprising:sampling packets of a first data stream received at a first router of aplurality of routers in the connectivity service provider environment;generating a stream signature of the first data stream based at least ona subset of the sampled packets, the stream signature comprising anidentification of the first router at which the sampled packets werereceived; determining, based at least in part on the identification ofthe router in the stream signature, that the first data stream issimilar to a second data stream relayed through a second routerdifferent than the first router; and rerouting, based on thedetermining, at least the first data stream within the connectivityservice provider environment.
 9. The method of claim 8, wherein thesubset of sampled packets is generated by filtering the sampled packetsto exclude at least one packet based on a determination that the atleast one packet does not change a stream signature for the first datastream.
 10. The method of claim 9, wherein the stream signature isgenerated without parsing a payload of the filtered packets.
 11. Themethod of claim 9, wherein the stream signature is generated furtherbased at least on a source address and a destination address of at leastone packet of the filtered packets.
 12. The method of claim 8, whereinthe stream signature is generated without parsing a payload of thesubset of sampled packets.
 13. The method of claim 8, wherein the streamsignature is generated further based on statistical information aboutthe first data stream comprising at least one of: mean of data rate;variance of data rate; skewness of data rate; minimum data rate; ormaximum data rate.
 14. The method of claim 8, wherein the first datastream is sampled according to a predetermined time interval, andwherein the first data stream is sampled for a predetermined timeperiod.